fix(terminal): kill entire process group on Stop so child commands are not orphaned#525
fix(terminal): kill entire process group on Stop so child commands are not orphaned#525awschmeder wants to merge 7 commits into
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughSpawn subprocesses with detached:true (non-Windows), force UTF‑8 env and stdin ignored; abort() targets the detached process group (POSIX) or uses taskkill (Windows) with subprocess fallback; run() emits a SIGKILL-like completion when aborted; tests, deps, and a changeset updated. ChangesProcess Group Termination Fix
Sequence DiagramsequenceDiagram
participant Caller
participant ExecaTerminalProcess
participant DetachedGroup as Detached_Process_Group
participant Subprocess
Caller->>ExecaTerminalProcess: abort()
ExecaTerminalProcess->>ExecaTerminalProcess: check if pid exists
alt pid available
ExecaTerminalProcess->>DetachedGroup: process.kill(-pid, SIGKILL)
DetachedGroup->>DetachedGroup: grouped processes receive SIGKILL
else kill fails
ExecaTerminalProcess->>Subprocess: subprocess.kill(SIGKILL)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsStopped waiting for pipeline failures after 30000ms. One of your pipelines takes longer than our 30000ms fetch window to run, so review may not consider pipeline-failure results for inline comments if any failures occurred after the fetch window. Increase the timeout if you want to wait longer or run a Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
.changeset/kill-orphaned-child-processes.md (1)
5-5: ⚡ Quick winImprove grammar for clarity.
The phrase "are not orphaned and allowed to continue running" is awkward. Consider rephrasing for better clarity.
📝 Suggested rewording
-fix(terminal): kill entire process group on Stop so child commands (e.g. `sleep 30`) are not orphaned and allowed to continue running after user clicked 'Stop'. +fix(terminal): kill entire process group on Stop so child commands (e.g. `sleep 30`) are not orphaned or allowed to continue running after user clicked 'Stop'.Or alternatively:
-fix(terminal): kill entire process group on Stop so child commands (e.g. `sleep 30`) are not orphaned and allowed to continue running after user clicked 'Stop'. +fix(terminal): kill entire process group on Stop to prevent orphaned child commands (e.g. `sleep 30`) from continuing after user clicked 'Stop'.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.changeset/kill-orphaned-child-processes.md at line 5, Update the changelog entry text in .changeset/kill-orphaned-child-processes.md: replace the awkward phrase "are not orphaned and allowed to continue running after user clicked 'Stop'" with a clearer rewording such as "so child commands (e.g. `sleep 30`) are not left running after the user clicks 'Stop'" or alternatively "so child commands (e.g. `sleep 30`) are terminated when the user clicks 'Stop'"; edit the diff line starting with "fix(terminal): kill entire process group on Stop..." to use one of these clearer phrasings.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.changeset/kill-orphaned-child-processes.md:
- Around line 1-3: The changeset description uses awkward wording ("and allowed
to continue running") for the "zoo-code" package; update the .changeset entry
text so it explicitly states what the fix prevents (for example: "prevents child
processes from being orphaned and continuing to run" or similar), replacing the
vague phrase with a clear description of the bug and its remediation in the
kill-orphaned-child-processes changeset.
In `@src/integrations/terminal/ExecaTerminalProcess.ts`:
- Around line 84-94: The abort branch returns early after awaiting
this.subprocess and emitting "shell_execution_complete", but it skips the
cleanup performed later (setActiveStream(undefined), stopHotTimer(),
emit("completed", ...), and clearing subprocess), leaving the terminal stuck
busy; before returning from the if (this.aborted) block, invoke the same cleanup
steps used on normal completion: call setActiveStream(undefined),
stopHotTimer(), emit("completed", { exitCode: 137, signalName: "SIGKILL" }) (or
ensure the completed event contains the same payload), and set this.subprocess =
undefined, then return.
---
Nitpick comments:
In @.changeset/kill-orphaned-child-processes.md:
- Line 5: Update the changelog entry text in
.changeset/kill-orphaned-child-processes.md: replace the awkward phrase "are not
orphaned and allowed to continue running after user clicked 'Stop'" with a
clearer rewording such as "so child commands (e.g. `sleep 30`) are not left
running after the user clicks 'Stop'" or alternatively "so child commands (e.g.
`sleep 30`) are terminated when the user clicks 'Stop'"; edit the diff line
starting with "fix(terminal): kill entire process group on Stop..." to use one
of these clearer phrasings.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 0f419096-4cc7-4a8c-b2e9-6b2d69664db3
⛔ Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (4)
.changeset/kill-orphaned-child-processes.mdsrc/integrations/terminal/ExecaTerminalProcess.tssrc/integrations/terminal/__tests__/ExecaTerminalProcess.spec.tssrc/package.json
💤 Files with no reviewable changes (1)
- src/package.json
| --- | ||
| "zoo-code": patch | ||
| --- | ||
|
|
There was a problem hiding this comment.
No need to commit a changeset file
navedmerchant
left a comment
There was a problem hiding this comment.
detached: true would briefly flash a terminal in windows causing a jarring user experience. I am not sure if there is a good solution to this problem.
…st (race condition)
…th and improve changeset wording
On Windows, process.kill(-pid) is not available (POSIX-only). Add a platform guard in abort() that uses spawnSync taskkill /F /T /PID to kill the full process tree (shell + all child commands) on win32. Adds child_process mock and platform-guarded Windows unit tests (skipped on non-win32).
e80b4e1 to
cd4af7a
Compare
|
Note on Windows behavior: The path added in the latest commit has not been manually tested on a Windows machine by the author. The unit tests for it are present but skipped on non-win32 (they exercise the mocked call, not the real binary). I will test it on Windows and confirm. |
Related GitHub Issue
Closes: #524
Description
When Execa spawns a command with
shell: true, Node forks a shell process which in turn forks the actual command. Killing only the shell PID leaves its children running -- the shell exits but does not propagate the signal to its children, so processes likesleep 30keep going until they finish naturally.How the fix works:
detached: trueto the Execa options. This causes the OS to place the shell and all of its descendants into a new process group, with the shell as the group leader. On Windows,detached: trueis omitted -- it causes a console-window flash thatwindowsHidecannot reliably suppress, and it is not needed sincetaskkill /F /Thandles tree-kill without a detached process.abort(), branch on platform:process.kill(-pid, "SIGKILL")(negative PID = kill the entire process group). This atomically signals every process in the group -- the shell, its children, grandchildren, and any background jobs -- with no race condition.spawnSync("taskkill", ["/F", "/T", "/PID", pid]). The/Tflag kills the full process tree (shell + all descendants).process.kill(-pid)is POSIX-only and not available on Win32.{ exitCode: 137, signalName: "SIGKILL" }instead ofexitCode: 0, so the UI running-indicator dot correctly turns red (consistent with the integrated terminal path).ps-treedependency and the associated 100 ms delayed PID-tracking logic, which was the previous (incomplete) attempt at this fix.Reviewers should note:
detached: trueoption is required on POSIX forprocess.kill(-pid, ...)to work. Without it the subprocess is in the parent's process group and a negative-PID kill would affect unrelated processes.subprocess.kill("SIGKILL")is retained for the edge case where the primary kill throws (e.g. ESRCH -- process already exited).TerminalProcess) is not changed by this PR. It sends Ctrl+C viasendTextand cannot reach background jobs inside the PTY. This is a known limitation documented in the test results file.taskkillpath has not been manually tested by the author -- unit tests covering the mockedspawnSynccall are present (platform-guarded, skipped on non-win32). Community verification welcome.Test Procedure
Automated tests (18 passing, 2 skipped on non-win32):
New tests cover:
taskkill /F /T /PID(skipped on non-win32)subprocess.killwhen primary kill throws (both platforms)exitCode: 137emitted on abortabortedflag is checkedManual test (macOS/Linux):
sleep 30ps aux | grep sleep | grep -v grepTested successfully:
sleep 30, nestedsh -c 'sh -c "sleep 30"', loop with sleeps.Note: background jobs via
sh -c 'sleep 15 & sleep 15 & sleep 15 & wait'are killed correctly when using the inline terminal, but are not killed when using integrated terminal due to fundamental limitations with the integrated terminal path.Pre-Submission Checklist
Screenshots / Videos
No UI changes beyond the running-indicator dot color correction (green -> red on abort).
Documentation Updates
Additional Notes
ps-treeand@types/ps-treeare removed fromsrc/package.jsonas they are no longer used.detached: trueis set only on POSIX (where it is required forprocess.kill(-pid, "SIGKILL")to target only the spawned process group, not the Node.js parent process). On Windows it is skipped:detached: truetriggers a console-window flash thatwindowsHidecannot reliably suppress due to a race in the WindowsCreateProcessAPI.taskkill /F /T /PIDachieves the equivalent tree-kill.taskkill.exeis a Win32 system binary available on all Windows versions since XP, invoked directly viaspawnSync(bypasses the user's configured shell). Note:taskkill /Twalks the PPID tree rather than targeting a kernel-tracked group, so it is susceptible to orphaned descendants if an intermediate process exits before Stop is clicked. This is strictly less racy than the previousps-tree-based approach, which was identically vulnerable but introduced an extra 100 ms delay that widened the race window. A fully race-free Windows solution would require a helper utility to assign processes to a Windows Job Object (which the kernel tracks independently of the PPID chain), but that is out of scope for this PR.